BDDC by a Frontal Solver and the Stress 
Computation in a Hip Joint Replacement 



Jakub Sfstek a ' d , Jaroslav Novotny b , Jan Mandel c , 
Marta Certfkova a , Pavel Burda a 

00 

a Department of Mathematics, Faculty of Mechanical Engineering 
Czech Technical University in Prague 

Ch b Department of Mathematics, Faculty of Civil Engineering 

I ^ Czech Technical University in Prague 

£\\ c Department of Mathematical and Statistical Sciences, 

University of Colorado Denver 

d Institute of Thermomechanics, Academy of Sciences of the Czech Republic 

Z, 

■5 

S 



> 



Abstract 

A parallel implementation of the BDDC method using the frontal solver is employed 
to solve systems of linear equations from finite element analysis, and incorporated 
into a standard finite element system for engineering analysis by linear elasticity. 
<3\ Results of computation of stress in a hip replacement are presented. The part is 

made of titanium and loaded by the weight of human body. The performance of 
BDDC with added constraints by averages and with added corners is compared. 
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1 Introduction 



Parallel numerical solution of linear problems arising from linearized isotropic 
elasticity discretized by finite elements is important in many areas of 
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engineering. The matrix of the system is typically large, sparse, and ill- 
conditioned. The classical frontal solver [S] has became a popular direct 
method for solving problems with such matrices arising from finite element 
analyses. However, for large problems, the computational cost of direct 
solvers makes them less competitive compared to iterative methods, such 
as the preconditioned conjugate gradients (PCG) jl]. The goal is then to 
design efficient preconditioners that result in a lower overall cost and can 
be implemented in parallel, which has given rise to the field of domain 
decomposition and iterative substructuring [T7] . 

The Balancing Domain Decomposition based on Constraints (BDDC) [6] is 
one of the most advanced preconditioners of this class. However, the additional 
custom coding effort required can be an obstacle to the use of the method in an 
existing finite element code. We propose an implementation of BDDC built 
on top of common components of existing finite element codes, namely the 
frontal solver and the element stiffness matrix generation. The implementation 
requires only a minimal amount of additional code and it is therefore of 
interest. For an important alternative implementation of BDDC, see [TT] . 

BDDC is closely related to FETI-DP [7]. Though the methods are quite 
different, they can be built largely from the same components, and the 
eigenvalues of the preconditioned problem (other than the eigenvalue equal to 
one) in BDDC and FETI-DP are the same [13]. See also [2TTTlfT3] for simplified 
proofs. Thus the performance of BDDC and FETI-DP is essentially identical, 
and results for one method apply immediately to the other. 

The frontal solver was used to implement a limited variant of BDDC in 
PfTU] . The implementation takes advantage of the existing integration of the 
frontal solver into the finite element methodology and of its implementation 
of constraints, which is well-suited for BDDC. However, the frontal solver 
treats naturally only point constraints, while an efficient BDDC method 
in three dimensions requires constraints on averages. This fact was first 
observed for FETI-DP experimentally in [7], and theoretically in [10J, but 
these observations apply to BDDC as well because of the equivalence between 
the methods. 

In this paper, we extend the previous implementation of BDDC by the frontal 
solver to constraints on averages and apply the method to a problem in 
biomechanics. We also compare the performance of the mehod with adding 
averages and with additional point constraints. The implementation relies 
on the separation of point constraints and enforcing the rest by Lagrange 
multipliers, as suggested already in [6]. One new aspect of the present approach 
is the use of reactions, which come naturally from the frontal solver, to avoid 
custom coding. 
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2 Mathematical formulation of BDDC 



Consider the problem in a variational form 



a(u,v) = (f,v) WveV, 



(1) 



where V is a finite element space of M 3 -valued piecewise polynomial functions 
v continuous on a given domain n C R 3 , satisfying homogeneous Dirichlet 
boundary conditions, and 



Here A and /i are the first, and the second Lame's parameter, respectively. 
Solution u G V represents the vector field of displacement. It is known that 
a(u, v) is a symmetric positive definite bilinear form on V. An equivalent 
formulation of (HI) is to find a solution u to a linear system 



where A = (a^) is the stiffness matrix computed 4>j), where {fa} 

is a finite element basis of V, corresponding to set of unknowns, also called 
degrees of freedom, defined as values of displacement at the nodes of a given 
triangulation of the domain. The domain Q is decomposed into nonoverlapping 
subdo mains Q iy i = 1, . . . N, also called substructures. Unknowns common to 
at least two subdomains are called boundary unknowns and the union of all 
boundary unknowns is called the interface T. 

The first step is the reduction of the problem to the interface. This is quite 
standard and described in the literature, e.g., [T7j. The space V is decomposed 
as the a-orthogonal direct sum V — V\ © • • • © Vjv © Vr, where Vi is the space 
of all functions from V with nonzero values only inside (in particular, 
they are zero on T), and Vr is the a-orthogonal complement of all spaces 
Vr = {v G V : a(v,w) = Ww G Vj, i — 1,...N}. Functions from 
Vr are fully determined by their values at unknowns on T and the discrete 
harmonic condition that they have minimal energy on every subdomain (i.e. 
solve the system with zero right hand side in corresponding equations). They 
are represented in the computation by their values on the interface T. Solution 
u may be split into the sum of interior solutions u Q = u u u i and 
ur G Vr- Then problem ^ may be rewritten as 



Let us now write problem (|4]) in the block form, with the first block 
corresponding to unknowns in subdomain interiors, and the second block 





Au = f, 



(3) 




(4) 
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corresponding to unknowns at the interface, 



An 


A 12 




Un + u ol 




fl 


A21 


A22 




UY2 + U o2 




A 



(5) 



with u o2 = 0. Using the fact that functions from Vr are energy orthogonal 
to interior functions (so it holds A n un + Ai 2 ur2 = 0) and eliminating the 
variable u \, we obtain that ^ is equivalent to 



An A 12 
A 2 i A 22 

and the solution is obtained as u 
problem 



A n u i 

UT2 



fl 





A 2 iu ol 



(6) 



(7) 



ur + u a . Problem ^ is equivalent to the 



An A 12 




un 







S 




UT2 
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where S is the Schur complement with respect to interface: S = A22 — 
A2iA n 1 Ai2, and #2 is the condensed right hand side g 2 = f 2 — A 2 iu i = 
f 2 — A 2 iA{l fi. Problem (j8| can be split into two problems 

AnUvi = -A 12 u r2 , (9) 

Su r2 = g 2 - (10) 
Since An has a block diagonal structure, the solution to dol) may be found in 



parallel and similarly the solution to (|9j). 

The BDDC method is a particular kind of preconditioner for the reduced 



problem (10). The main idea of the BDDC preconditioner in an abstract form 
[H] is to construct an auxiliary finite dimensional space W such that Vr C W 
and extend the bilinear form a (•, •) to a form a (•, •) defined on W x W and 
such that solving the variational problem Q with a (•, •) in place of a (•, •) is 
cheaper and can be split into independent computations done in parallel. Then 
the solution projected to Vr is used for the preconditioning of S. Specifically, 
let E : W — > Vr be a given projection of W onto Vr, and r 2 = g 2 — Sur2 the 
residual in a PCG iteration. Then the output of the BDDC preconditioner is 
the part v 2 of v = Ew, where 



w e W : a (w, z) = (r, Ez) Vz G W, 



(11) 





T\ 




V\ 


r = 




V = 






T2 




V2 
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In terms of operators, v = ES l E T r, where S is the operator on Vr associated 
with the bilinear form 5 (but not computed explicitly as a matrix). 

Note that while the residual r 2 in the PCG method applied to the reduced 



problem is given at the interface only, the residual in (11) has the dimension of 
all unknowns on the subdomain. This is corrected naturally by extending the 
residual r 2 to subdomain interiors by zeros (setting r\ — 0), which is required 
by the condition that the solution v is discrete harmonic inside subdomain. 
Similarly, only interface values t>2 of v are used in further PCG computation. 
Such approach is equivalent to computing with explicit Schur complements. 

The choice of the space W and the projection E determines a particular 
instance of BDDC (HJH]. All functions from Vr are continuous on the domain 
Q. In order to design the space W, we relax the continuity on the interface 
r. On T, we select coarse degrees of freedom and define W as the space of 
finite element functions with minimal energy on every subdomain, continuous 
across T only at coarse degrees of freedom. The coarse degrees of freedom can 
be of two basic types - explicit unknowns (called coarse unknowns) at selected 
nodes (called corners), and averages over larger sets of nodes (subdomain 
faces or edges). The continuity condition then means that the values at the 
corresponding corners, resp. averages, on neighbouring subdomains coincide. 
The bilinear form a (•, •) from ^ is extended to a (•, •) on Wx W by integrating 
Q over the subdomains f2j separately and adding the results. 

The projection E : W — > Vr is defined at unknowns on the interface T (the 
Ur2 part) as a weighted average of values from different subdomains and thus 
resulting in function continuous across the interface. These averaged values 
on T determine the projection E, because values inside subdomains (the un 
part) are then obtained by the solutions of local subdomain problems ^ to 
make the averaged function discrete harmonic. To assure good performance 
regardless of different stiffness of the subdomains [12], the weights are chosen 
proportional to the corresponding diagonal entries of the subdomain stiffness 
matrices. The transposed projection E T is used for distribution of the residual 
r 2 among neighbouring subdomains and represents the decomposition of unity 
at unknowns on interface. 

The decomposition into subspaces used to derive the problem with Schur 



complement (10) is now repeated for space W, with the coarse degrees 
of freedom playing the role of interface unknowns and a(-, •) the role of 
a(-,-). Namely, space W is decomposed as a-orthogonal direct sum W = 
W\ © ■ • • © Wn © Wc, where Wi is the space of functions with nonzero values 
only in fli outside coarse degrees of freedom (they have zero values at corners, 
they are generally not continuous at other unknowns on T, and they have zero 
averages) and Wc is the coarse space, defined as the a-orthogonal complement 
of all spaces Wf W c = {v G W : a(v,w) = Vu> G W i} i = 1,...N}. 
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Functions from Wc are fully determined by their values at coarse degrees of 
freedom (where they are continuous) and have minimal energy. Thus, they 
are generally discontinuous across T outside the corners. The solution w G W 
from (11) is now split accordingly as w = Wc + Y^i=i w ii where wc, determined 

wc G Wc : a (w c ,v) = (r, Ev) \/v G Wc, (12) 
is called the coarse correction, and Wi, determined by 

WiEWi-.a (wt, v) = (r, Ev) \/v G W h (13) 

is the substructure correction from % = 1, . . . N. 

Let us now rewrite the BDDC preconditioner in terms of matrices, following 



[6] . Problem ( 13 ) is formulated in a saddle point form as 







Wi 






Ci _ 




Hi 








(14) 



where Ki denotes the substructure local stiffness matrix, obtained by the 
subassembly of element matrices only of elements in substructure i, matrix 
Ci represents constraints on subdomain, that enforce zero values of coarse 
degrees of freedom, //j is vector of Lagrange multipliers, and is the weighted 
residual E T r restricted to subdomain i. 

Matrix Ki is singular for floating subdomains (subdomains not touching 



Dirichlet boundary conditions), while the augmented matrix of problem (14) is 
regular and may be factorized. Matrix Ci contains both constraints enforcing 
continuity across corners (single point continuity), and constraints enforcing 
equality of averages over edges and faces of subdomains. The former type 
corresponds to just one nonzero entry equal to 1 on a row of Cj, while the 
latter leads to several nonzero entries on a row. This structure will be exploited 
in the following section. 



Problem (14) is solved in each iteration of the PCG method to find the 



correction from substructure i. However, the matrix of (14) is used prior the 
whole iteration process to construct the local subdomain matrix of the coarse 
problem. First, the coarse basis functions are found independently for each 
subdomain as the solution to 



(15) 



This is a problem with multiple right hand sides, where ipi is a matrix of coarse 
basis functions with several columns, each corresponding to one coarse degree 
of freedom on subdomain. These functions are given by values equal to at all 







■0i 







Ci _ 




A 
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coarse degrees of freedom except one, where they have value equal to 1, and 
they have minimal energy on subdomain outside coarse degrees of freedom. 
The identity block / has the dimension of the number of constraints on the 
subdomain. 

Once ipi is known, the subdomain coarse matrix Kc% is constructed as 

K Ci = tfKtfi. (16) 



Matrices Kc% are then assembled to form the global coarse matrix Ac- This 
procedure is same as the standard process of assembly in finite element 
solution, with subdomains playing the role of elements, coarse degrees of 
freedom on subdomain representing degrees of freedom on element, and matrix 
Kci representing the element stiffness matrix. 



Problem (12) is now 

A c w c = r c , (17) 

where rc is the global coarse residual obtained by the assembly of the 
subdomain contributions of the form rci = il ) f r i- 

The coarse solution wc has the dimension of the number of all coarse degrees 
of freedom. So, to add the correction to subdomain problems, we first have to 
restrict it to coarse degrees of freedom on each subdomain and to interpolate 
it to the whole subdomain by wci = i>iwa- By extending wc% and Wi by zero 
to other subdomains, these can be summed over the subdomains to form the 
final vector w. Finally, the preconditioned residual is obtained as v = Ew. 

It is worth noticing that in the case of no constraints on averages, i.e. using 
only coarse unknowns for the definition of the coarse space, matrix Ac of 



problem (17) is simply the Schur complement of matrix A with respect to 
coarse unknowns. This fact was pointed out in [11]. If additional degrees of 
freedom are added for averages, they correspond to new explicit unknowns in 

Obviously, several mapping operators among various spaces are needed in 
the implementation, defining embedding of subdomains into global problem, 
local subdomain coarse problem into global coarse problem etc. We have 
circumvented their mathematical definition by words for the sake of brevity, 
while we refer to [6TIT2"] for rigorous definitions of these operators. 
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3 BDDC implementation based on frontal solver 



The frontal solver implements the solution of a square linear system with some 
of the variables having prescribed values. Equations that correspond to the 
fixed variables are omitted and the values of these variables are substituted into 
the solution vector directly. The output of the solver consists of the solution 
and the resulting imbalance in the equations, called reaction forces. More 
precisely, consider a block decomposition of the vector of unknowns x with 
the second block consisting of all fixed variables, and write a system matrix A 
with the same block decomposition (here, the decomposition is different from 
the one in Section [2]). Then on exit from the frontal solver, 



An A l2 
A21 A 2 2 





Xi 




fl 


+ 







x 2 




/a 




r 2 



where fixed variable values X2 and the load vectors f\ and f 2 are the inputs, 
while the solution x\ and the reaction r 2 are the outputs. Stiffness matrices 
of elements are input instead of the whole matrix, and their assembly is done 
simultaneously with the factorization inside the frontal solver. 

The key idea of this section is to split the constraints in matrix Cj and to 
handle them in different ways. Those enforcing zero values at corners will be 
enforced as fixed variables, while the remaining constraints, corresponding to 
averages and denoted Cj, will be still enforced using Lagrange multipliers. the 

In the rest of this section, we drop the subdomain subscript % and we write 
subdomain vectors w in the block form with the second block consisting 
of unknowns that are involved in coarse degrees of freedom (i.e. coarse 
unknowns), denoted by the subscript c , and the first block consisting of the 
remaining degrees of freedom, denoted by the subscript /. The vector of the 
coarse degrees of freedom given by averages is written as Cw, where each row 
of C contains the coefficients of the average that makes that degree of freedom. 
Then subdomain vectors w € W are characterized by w c = 0, Cw = 0. Assume 
that C = [Cf C c ], with C c = 0, that is, the averages do not involve single 
variable coarse degrees of freedom; then Cw = C/Wf. The subdomain stiffness 
matrix K is singular for floating subdomains, but the block Kff is nonsingular 
if there are enough corners to eliminate the rigid body motions, which will be 
assumed. 



We now show how to solve ( 14 ) - ( 17 ) using the frontal solver. In the case when 
there are no averages as coarse degrees of freedom, we recover the previous 
method from [3]fTp] . 



The local substructure problems (13) are written in the frontal solver form 
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(pi) as 



Kf f K fc Cj 
K cf K cc 
C, 





Wf 




r 









w c 







+ 


Rea 




n 













(19) 



where w c = 0, r is the part in the / block of the residual in the PCG method 
distributed to the substructures by the operator E T , and Rea is the reaction. 
The constraint w c = is enforced by marking the w c unknowns as fixed, while 
the remaining constraints CfWf = are enforced via the Lagrange multiplier 
fi. Using the fact that w c = 0, we get from (19) that 



K ff w f = 
K c fWf = Rea 
Cfw f = 0. 



Cjfi + r, 



(20) 
(21) 
(22) 



From (20|) 
problem 



Wf = Kjj [—Cj A* + r )- Now substituting Wf into (22), we get the 
or Lagrange multiplier //, 



-Cf- 



C f K ff C Jv 



CfKjjr. 



(23) 



The matrix CfKjfCj is dense but small, with the order equal to the number 
of averages on the subdomain, and it is constructed by solving the system 
KffU = Cj with multiple right hand sides by the frontal solver and then the 



multiplication CfU. After solving problem (23), we substitute for fi in (20) 
and find Wf from 



Kff 


Kfc 




Wf 




'r-Cjfi' 


+ 

















_K c f 


K cc 




w c 









Rea 



(24) 



by the fronta 



solver, considering w c = fixed. The factorization in the frontal 



solver for (24) and the factorization of the matrix CfKffCj for (23) need to 
be computet! only once in the setup phase. 



The coarse problem (17) is solved by the frontal solver just like a finite 
element problem, with the subdomains playing the role of elements. It only 
remains to specify the basis functions of Wc on the subdomain from ( fisj ), 
and compute the local subdomain coarse matrix (16) efficiently. Denote by 



ip c the matrix whose colums are coarse basis functions associated with the 
coarse unknowns at corners, and ip av9 the matrix made out of the coarse 
basis functions associated with averages. To find the coarse basis functions, 



we proceed similarly as in (19) and write the equations for the coarse basis 
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functions in the frontal solver form, now with multiple right-hand sides, 



K f , K fc Cj 
K cf K cc 

c f o 



Ipj ifjf 9 







I 







A c X avg 




I 





Rea c Rea av 9 




(25) 



where Rea c and Rea av9 are matrices of reactions. Denote ipf 



R 



' avg 
'c rc 



/ 



. Then (25) becomes 



/ 



A 



A c X av9 



, Rea 



Rea c Rea av9 



and 



K ff 4> f + K fc ij c -- 
K c fipf + K cc ip c = Rea 
C f ^ f = R. 



cjx, 



(26) 
(27) 
(28) 



From (26), we get ipf = ~Kfj (Kfc^c + C/^V Substituting ^ into (28), we 
derive xhe problem for Lagrange multipliers 



CfKjICjX = -(R + C f Kj}K fc ^p. 



(29) 



which is solved for A by solving the system (29) for multiple right hand sides. 
Since ip c is known, we can use the frontal solver to solve (26)-(27) to find "0/ 
and Rea: 



K„ 


K fc 








'-cjx 


+ 







K cc _ 













Rea 



(30) 



considering %p c fixed. Finally, we construct the local coarse matrix 
corresponding to the subdomain as 



K c = ip T Kip = i) T 



-CJX 
Rea 



1% € 



-cjx 

Rea 



tfcjx + 



Rea, 
(31) 



where i/j 



avg 



At the end of the setup phase, the matrix of coarse problem is factored by the 
frontal solver, using subdomain coarse matrices as input. 
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4 The algorithm 



The setup starts with two types of factorization of the system matrix by frontal 
algorithm that differ only in the set of fixed variables. In the first factorization, 
all interface unknowns are prescribed as fixed, for the solution of problems ^ 
and ([7]). In the second factorization, only the coarse unknowns are fixed, for the 
solution of problems (24) and (30). Both factorizations are done subdomain by 



subdomain, so the interface unknowns are represented by their own instance 
in different subdomains. This naturally leads to parallelization according to 
subdomains. Then the main steps of the algorithm are as follows. 

(A) Solve ^ in parallel on every subdomain for g 2 as reaction using the frontal 
solver with w o2 = fixed, 



An A n 
A-21 A 2 2 





Uol 




h 


+ 












h 




-92 



(32) 



(B) Solve (10) for u^2 using PCG with BDDC preconditioner (for more details 
see bellow). 

(C) Compute the solution u from ^ using the frontal solver with w r2 fixed 
and u o2 = 0. 

Step (B) in detail follows. Before starting cycle of PCG iterations, compute in 
advance: 



side of (30) is computed from (29). 



Matrix CfKjjCj of (23), resp. rt29h and its factorization. 

Coarse basis functions xpi of (15) horn (30). Multiplier A on the right hand 



Local matrices Kqi of (16) from (|31|) 
Factorization of matrix Ac of (17 
matrices Kci as 'element' matrices. 



using the frontal solver with local coarse 



The first residual r 2 of (10) from the first approximation of Wr 2 using the 



frontal solver on ^ with «r 2 fixed 



An A12 
A21 A22 





Uvi 







+ 







UT2 




92 




-r 2 



(33) 



For the popular choice of initial solution ut2 = 0, this step reduces to setting 
r 2 — 92 (the discrete harmonic extension of zero function on interface is zero 
also in the interior of subdomain). 

Then in every PCG iteration, compute t> 2 from r 2 in three steps: 

(1) Compute E T r by distributing the residual r 2 on interface T among 
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(2) 



(3) 



neigbouring subdomains. 

For every subdomain, compute as restriction of E T r to that subdomain. 
Compute w = S~ 1 E T r as sum of all substructure corrections and 
coarse correction wc- These can be computed in parallel for every rf 
• Substructure correction Wi given by (14) is computed from (24) using 
frontal with w c = fixed. Multiplier fi on the right hand side of (24 ) 



is computed from ( 23 ) and r in both ( 24 ) and ( 23 ) is the part in the f 

block of the fj. 
• Coarse correction wc is solution of (17). 
We are interested in values of w only on interface. 

Compute the v 2 part of v — Ew at every interface node as weighted 
average of values of w at that node (neigbouring subdomains have 
generally different values of w at corresponding interface nodes). 



Note that in every iteration of PCG, the product Sp 2 is needed, where p 2 is 
a search direction. This product is computed using the frontal solver with p 2 
fixed, 



A n 


A 12 




Pi 






















+ 




A 21 


A 2 2_ 




_P2_ 









Sp 2 



(34) 



The interior part pi is computed only as a by-product, and it is not used in 
the PCG iterations. 



5 Numerical results 



The implementation was first tested on the problem of unit cube, a classical 
test problem of domain decomposition methods. In our case, the cube is made 
of steel with Young's modulus 2.1 • 10 11 Pa and Poisson's ratio 0.3. The cube 
is fixed at one face and loaded by the force of 1, 000 N, acting on one edge 
opposite to the fixed face in direction parallel to it and pointing outwards of the 
cube. The mesh consists of 32 3 = 32, 768 trilinear elements. It was uniformly 
divided into 8 and 64 subdomains, resulting in H/h = 16 and H/h = 8, 
respectively (here H denotes the characteristic size of subdomains and h the 
characteristic size of elements). These divisions are presented in Figure [l] The 
interface is initially divided into 7 corners, 6 edges, and 12 faces in the case of 
8 subdomains, and into 81 corners, 108 edges, and 144 faces in the case of 64 
subdomains. 

All experiments with this problem were computed on 8 1.5 GHz Intel Itanium 2 
processors of SGI Altix 4700 computer in CTU Supercomputing Centre, 
Prague. The stopping criterion of PCG was chosen as IMh/IMh < 10~ 6 . 
In the presented results, an external parallel multifrontal solver MUMPS [1J 
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was used for the factorization and solution of the coarse problem (17), instead 
of the serial frontal solver. 



The first experiment compares two ways of enriching the coarse space Wc, 
namely by adding point constraints on randomly selected variables on the 
substructure interfaces, i.e. adding more "corners" (Fig. [2] |4]) , and by adding 
averages to the initial set of corners (Table [I] and Table [2]). In the first column 
of these tables, no additional averages are considered and only corners were 
used in the construction of Wc- Then we enforce the equality of arithmetic 
averages over all edges, over all faces, and over all edges and faces, respectively. 



Although adding corners leads to an improvement of preconditioner in terms 
of the condition number (Fig. [3]) and the number of iterations (Fig. |2| , after a 
slight decrease early on the total computational time increases (Fig. |4| due to 
the added cost of the setup and factorization of the coarse problem. This effect 
is particularly pronounced with 8 subdomains, where the cost of creating the 
coarse matrix dominates, as the frontal solver internally involves multiplication 
of large dense matrices to compute reactions: (30) takes O(njn^) for multiple 
right hand sides, where is the number of variables and n C i is the number 
of coarse variables in subdomain i. The problem divided into 64 subdomains 
requires much less time than the problem with 8 subdomains also due to 
the fact that the factorization time for subdomain problems grows fast with 
subdomain size. Note that for 64 subdomains, the number of processors 
remains the same and each of the 8 processors handles 8 subdomains. 



The structural analysis of the replacement of the hip joint construction loaded 
by pressure from body weight is an important problem in bioengineering. The 
hip replacement consists of several parts made of titanium; here we consider 
the central part of the replacement joint. The problem was simplified to 
stationary linearized elasticity. The highest stress was reached in the notches of 
the holders. In the original design, holders of the hip replacement had thickness 
of 2 mm, which led to maximal von Mises stress about 1, 500 MPa. As the yield 
point of titanium is about 800 MPa, the geometry of the construction had to 
be modified. The thickness of the holders was increased to 3 mm, radiuses of 
the notches were increased, and the notches were made smaller, as in Fig. |5j 
The maximal von Mises stress on this new construction was only about 540 
MPa, which satisfed the demands for the strength of the construction [5]fl8] . 
The mesh consists of 33,186 quadratic elements resulting in 544,734 unknowns. 



The computation needs 400 minutes when using a serial frontal solver on 
Compaq Alpha server ES47 at the Institute of Thermomechanics, Academy 
of Sciences of the Czech Republic. With 32 subdomains and corner coarse 
degrees of freedom only, BDDC on a single Alpha processor took 10 times 
less, only 40 minutes. 
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In this paper, we present results for decompositions into 16 and 32 subdomains 
obtained by the METIS graph partitioner [5]. In the case of 16 subdomains, 
the interface topology leads to 35 corners, 12 edges, and 35 faces, and in the 
case of 32 subdomains, to 57 corners, 12 edges, and 66 faces. All presented 
results were obtained on 16 processors of SGI Altix 4700. Again, the stopping 
criterion of PCG was chosen as ||?i|2/||fli|2 < 10~ 6 . Also these results were 
obtained using MUMPS solver for the coarse problem solution. 

We again investigate the two approaches to coarse space enrichment. The 
results of random addition of corners are presented in Figures [7] -|9j Unlike in 
the case of the cube, a substantial decrease of the total time is achieved. The 
optimal number of additional corners depends on the division into subdomains. 

Results of adding averages are summarized in Table [3] for the case of 16 
subdomains, and in Table [4] for the case of 32 subdomains, both with the 
initial set of corners. 

The last experiment combines both approaches: we add averages to the optimal 
size of the set of corners, determined from Figure [9] as ~335 for the problem 
with 16 subdomains, and ~557 for the problem with 32 subdomains. These 
results are presented in Tables [5] and [6j We can see that this synergy can lead 
to the lowest overall time of the computation. 

We observe that while the implementation of averages leads to a negligible 
increase in the computational cost of the factorizations, it considerably 
improves the condition number, and thus reduces the overall time of solution. 
Also, the decomposition into 32 subdomains leads to significantly lower 
computational times than the division into 16 subdomains. Note that although 
the initial set of corners leads to non-singular local matrices and the coarse 
matrix and so successful setup of the preconditioner, the iterations do not 
converge in this case. 



6 Conclusion 

We have presented an application of a standard frontal solver within the 
iterative substructuring method BDDC. The method was applied to a 
biomechanical stress analysis problem. The numerical results show that the 
improvement of preconditioning by additional constraints is significant and 
can lead to a considerable savings of computational time, while the additional 
cost is negligible. 

For a model problem (cube), constraints by averages on edges and faces 
are required for good performance, as predicted by the theory [TOUTS'] , and 
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additional point constraints (i.e., corners) are not productive. However, for 
the hip replacement problem (which is far from a regularly decomposed cube), 
additional point constraints result in significantly lower total computational 
time than the averages, and the best result is obtained by combining both the 
added point constraints and the averages. 

For large problems and a large number of processors, load balancing will be 
essential [T5] . 
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Fig. 1. Cube problem, division into 8 (left) and 64 (right) subdomains 
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Fig. 2. Cube problem, number of iterations for adding corners 



coarse problem 


corners 


corners +edges 


corners +f aces 


corners+edges+faces 


iterations 


38 


19 


17 


13 


cond. number est. 


117 


15 


65 


7 


factorization (sec) 


49 


56 


52 


57 


peg iter (sec) 


21 


11 


10 


8 


total (sec) 


85 


85 


80 


83 



Table 1 

Cube problem, 8 subdomains, adding averages, initial set of corners 
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Condition number 
SGI All* 4700 
8 processors 
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Fig. 3. Cube problem, condition number for adding corners 
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Fig. 4. Cube problem, wall clock time for adding corners 
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coarse problem 


corners 


corners +edges 


corners +f aces 


corners+edges+faces 


iterations 


42 


16 


24 


11 


cond. number est. 


55 


8 


27 


4 


factorization (sec) 


2.2 


3.2 


2.8 


4.0 


peg iter (sec) 


7.5 


3.7 


5.1 


3.6 


total (sec) 


11.6 


8.8 


10.4 


9.6 



Table 2 

Cube problem, 64 subdomains, adding averages, initial set of corners 




Fig. 5. Hip joint replacement, von Mises stresses in improved design. 




Fig. 6. Hip joint replacement, division into 32 subdomains 
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Number of PCQ iterations 
SGI All* 4700 
1 6 processors 
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Fig. 7. 
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Hip joint replacement, number of iterations for adding corners 
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Fig. 8. Hip joint replacement, condition number for adding corners 
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Wall times for variable coarse problem 
SGI All)! 4700 
1 6 processors 
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Fig. 9. Hip joint replacement, wall clock time for adding corners 



coarse problem 


corners 


corners+edges 


corners+faces 


corners+edges+faces 


iterations 


181 


171 


69 


62 


cond. number est. 


4,391 


3,760 


535 


522 


factorization (sec) 


100 


70 


86 


80 


peg iter (sec) 


241 


216 


94 


87 


total (sec) 


380 


321 


216 


203 



Table 3 

Hip joint replacement, 16 subdomains, adding averages, 35 corners 



coarse problem 


corners 


corners+edges 


corners+faces 


corners +edges +f aces 


iterations 


>500 


>500 


137 


70 


cond. number est. 


n/a 


n/a 


n/a 


n/a 


factorization (sec) 


79 


65 


53 


52 


peg iter (sec) 


>545 


>547 


166 


87 


total (sec) 


>651 


>638 


236 


161 



Table 4 

Hip joint replacement, 32 subdomains, adding averages, 57 corners 
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coarse problem 


corners 


corners+edges 


corners+faces 


corners +edges +f aces 


iterations 


35 


34 


26 


26 


cond. number est. 


96 


96 


65 


65 


factorization (sec) 


91 


80 


78 


106 


peg iter (sec) 


53 


49 


38 


37 


total (sec) 


183 


166 


153 


181 



Table 5 

Hip joint replacement, 16 subdomains, adding averages, 335 corners 



coarse problem 


corners 


corners+edges 


corners+faces 


corners+edges+faces 


iterations 


35 


32 


30 


27 


cond. number est. 


149 


70 


59 


46 


factorization (sec) 


60 


57 


59 


62 


peg iter (sec) 


49 


40 


37 


34 


total (sec) 


128 


115 


113 


113 



Table 6 

Hip joint replacement, 32 subdomains, adding averages, 557 corners 
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